A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

نویسندگان

  • Ping Li
  • Kenneth Ward Church
چکیده

We should not have to look at the entire corpus (e.g., the Web) to know if two (or more) words are strongly associated or not. One can often obtain estimates of associations from a small sample. We develop a sketch-based algorithm that constructs a contingency table for a sample. One can estimate the contingency table for the entire population using straightforward scaling. However, one can do better by taking advantage of the margins (also known as document frequencies). The proposed method cuts the errors roughly in half over Broder’s sketches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Map-merging in Multi-robot Simultaneous Localization and Mapping Process Using Two Heterogeneous Ground Robots

In this article, a fast and reliable map-merging algorithm is proposed to produce a global two dimensional map of an indoor environment in a multi-robot simultaneous localization and mapping (SLAM) process. In SLAM process, to find its way in this environment, a robot should be able to determine its position relative to a map formed from its observations. To solve this complex problem, simultan...

متن کامل

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes

This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2007